Computing Discourse Information with Statistical Methods1~2
نویسنده
چکیده
This dissertation research involves implementing a computer system that, given a natural language dialogue, will automatically tag each utterance with a discourse label (a concise abstraction of the intentional function of the speaker) and a discourse pointer (a focusing mechanism that represents the dialogue context in which an utterance is to be understood). (Samuel 1996) Since the discourse label of an utterance is dependent on the surrounding dialogue, tagging utterances with discourse labels is similar to the part-of-speech (PoS) tagging problem in syntax. Within the domain of PoS tagging, extensive experimental research has shown that statistical learning algorithms are among the most successful. I will investigate two methods that have been effective in PoS tagging: Hidden Markov Models (HMMs) (Ch arniak 1993) and TransformationBased Learning (TBL) (Brill 1995). Unlike these PoS taggers, which determine a word’s tag based on the surrounding words (within a fixed window size), a discourse-tagging system must use the surrounding utterances as input. Thus, the sparse data problem is much more severe for the discourse tagger, since the number of possible utterances is infinite. To alleviate this problem, rather than directly processing each utterance verbatim (which would probably bombard the system with a great deal of extraneous information that is not relevant to the task at hand), I have identified a small set of features that can be extracted from each utterance to provide the relevant information to the learning algorithm. Since HMMs and TBL deal with contiguous sequences of discourse labels, they are unable to take focus shifts into consideration. But it is crucial to account for the focus shifts that frequently occur in discourse. I have proposed a solution to this problem for both algorithms. For HMMs, this involves modifying the Markov assumption slightly, while still retaining the linear-time efficiency of the HMMs approach. With TBL, the solution is more straightforward.
منابع مشابه
The impact of Cloud Computing in the banking industry resources
Today, one of the biggest problems that gripped the banking sphere, the high cost of implementing advanced technologies and the efficient use of the hardware. Cloud computing is the use of shared services on the Internet provides a large role in developing the banking system, without the need for operating expenses including staffing, equipment, hardware and software Reducing the cost of implem...
متن کاملThe impact of Cloud Computing in the banking industry resources
Today, one of the biggest problems that gripped the banking sphere, the high cost of implementing advanced technologies and the efficient use of the hardware. Cloud computing is the use of shared services on the Internet provides a large role in developing the banking system, without the need for operating expenses including staffing, equipment, hardware and software Reducing the cost of implem...
متن کاملKernel Based Discourse Relation Recognition with Temporal Ordering Information
Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures di...
متن کاملEFL Learners' Sensitivity to Linguistic and Discourse Factors in the Process of Anaphoric Resolution
The readers' ability to integrate current information with given information has been considered as an important component of reading comprehension process. One aspect of this integration process involves anaphoric resolution. The purpose of this study is to investigate the process of anaphoric resolution, focusing on inferential rigidity of different types of anaphoric ties. Ninety EFL learner...
متن کاملA Statistical Model for Discourse Act Recognition in Dialogue Interactions
This paper discusses a statistical model for recognizing discourse intentions of utterances during dialogue interactions. We argue that this recognition process should be based on features of the current utterance as well as on discourse history, and show that taking into account utterance features such as speaker information and syntactic forms of utterances dramatically improves the system’s ...
متن کامل